generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Issues: huggingface/trl
[Tracking issue] Integrate native liger-kernel losses
#2495
opened Dec 17, 2024 by
qgallouedec
Open
6
[Tracking issue] Wrong loss scaling when accumulating gradient
#2617
opened Jan 23, 2025 by
qgallouedec
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
LORA Continuos pre-training on 7B Instruct Model
✨ enhancement
New feature or request
⚡ PEFT
Related to PEFT
🏋 SFT
Related to SFT
#3509
opened May 29, 2025 by
sinchanabhat
High KL Divergence in GRPO with GPT2-Style Model (Due to Dropout?)
🏋 GKD
Related to GKD
🏋 GRPO
Related to GRPO
🏋 SFT
Related to SFT
#3500
opened May 27, 2025 by
cliang-huanglab
Converting a conversational dataset into a standard dataset [not working]
🐛 bug
Something isn't working
#3490
opened May 23, 2025 by
nbasyl
5 tasks done
Completions Only Loss is incompatible with use_liger_kernel set as true
🐛 bug
Something isn't working
🏋 SFT
Related to SFT
#3484
opened May 22, 2025 by
arashpreetsinghmor
Vision Fine Tuning Gemma 3 takes Impossiblily High VRam (OOM Error 8xH200)
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
⚡ PEFT
Related to PEFT
#3481
opened May 22, 2025 by
amanmehra89
5 tasks done
【GRPO】Why are some batches of prompts not involved in training?
#3477
opened May 22, 2025 by
moguizhizi
5 tasks done
Is it possible to make prompts dynamic (or iterable datasets) in GRPO training?
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#3474
opened May 21, 2025 by
onlyjokers
[GPG][new trainer] Add support to new New feature or request
GPG
method
✨ enhancement
#3472
opened May 20, 2025 by
lerogo
3 tasks done
[GRPO] bnb quantization + vllm
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
⚡ PEFT
Related to PEFT
#3466
opened May 18, 2025 by
shon-otmazgin-wix
5 tasks done
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO)
🏋 PPO
Related to PPO
🏋 SFT
Related to SFT
#3464
opened May 18, 2025 by
xmriz
Turn off Accelerate acceleration
⚡accelerate
Related to accelerate
🏋 GRPO
Related to GRPO
#3461
opened May 17, 2025 by
seTalent
Out of Memory when GRPO fine-tune Qwen3 4B model on 80G A100 GPU
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#3456
opened May 16, 2025 by
wa008
5 tasks done
PPO training fails when used with accelerate ⚡️ and Deepspeed 🚀
⚡accelerate
Related to accelerate
🚀 deepspeed
Related to deepspeed
🏋 PPO
Related to PPO
🏋 SFT
Related to SFT
#3453
opened May 16, 2025 by
marcellobullo
5 tasks done
GRPO reward=0 and loss=0
🏋 GRPO
Related to GRPO
🏋 Reward
Related to Reward modelling
#3452
opened May 15, 2025 by
LIUyizheSDU
torch distributed training with multi gpus errors in GRPOtrainer
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#3451
opened May 15, 2025 by
jinhonglu
5 tasks done
trl vllm-serve not working on latest.
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#3450
opened May 15, 2025 by
tcapelle
5 tasks done
[GRPO] num_generations
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#3443
opened May 13, 2025 by
shon-otmazgin-wix
5 tasks done
Unstructured data grpo training
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#3441
opened May 13, 2025 by
yuyuhua918
The performance has deteriorated significantly after fine-tuning with TRL but increase while using llama-factory
#3432
opened May 11, 2025 by
Hasuer
Add support for asynchronous reward functions in GRPOTrainer (and maybe other trainers)
#3426
opened May 8, 2025 by
ideechy
[GRPO] How to train model using vLLM and model parallelism on one node?
#3424
opened May 8, 2025 by
zhiqihuang
[Community Discussion] Progressive Tasks Datasets with Verification for Agentic RL
🏋 Reward
Related to Reward modelling
#3417
opened May 6, 2025 by
August-murr
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.